Efficient Matrix Product State Method for periodic boundary conditions 
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We introduce an efficient method to calculate the ground state of one-dimensional lattice models 
with periodic boundary conditions. The method works in the representation of Matrix Product 
States (MPS), related to the Density Matrix Renormalization Group (DMRG) method. It improves 
on a previous approach by Verstraete et ai. We introduce a factorization procedure for long products 
of MPS matrices, which reduces the computational effort from m"' to m^, where m is the matrix 
dimension, and m ~ 100 — 1000 in typical cases. We test the method on the S = ^ and S — 1 
Heisenberg chains. It is also applicable to non-translationally invariant cases. The new method 
makes ground state calculations with periodic boundary conditions about as efficient as traditional 
DMRG calculations for systems with open boundaries. 



One of the most severe problems in condensed mat- 
ter theory is the exponential growth of the Hilbert space 
with system size. This limits many methods such as ex- 
act diagonalization. One strategy that overcomes these 
difficulties is to approximate the ground state in some 
reduced Hilbert space. 

The Density Matrix Renormalization Group 
(DMRG)i"— is one prominent example of such methods. 
By tracing out "unimportant" degrees of freedom, the 
real ground state is approximated in a much smaller 
space. DMRG works much better for open boundary 
conditions (obc) than for periodic boundary conditions 
(pbc). In the worst case where the correlation length is 
much smaller than the system size, if the obc system 
needs rriohc states per block for a given accuracy, the 
pbc system needs 0{mly^^). Since the calculation time 
scales as nv^, the comparable time for pbc is 0(m^,^^). 
However, systems with obc naturally suffer from edge 
effects like Friedel oscillations. An efficient method for 
pbc would be highly desirable. For example, it would 
make finite size scaling easier, and allow the direct 
representation of finite momentum eigenstates^^— . 

It can be shown that the ground state produced by 
DMRG can quite naturally be written in terms of a so 
called matrix product state (MPS)^!^ for both obc and 
pbc. The original work presented an inefficient method 
for computing the MPS, which could not compete with 
DMRG. Recently, a number of new algorithms utilizing 
the MPS state directly have been introduced which are 
efficient and greatly extend the reach of DMRG/MPS 
techniques'^—, including the simulation of random sys- 
tems or a generalization to 2D-systems. In the present 
paper we investigate an algorithm presented in Ref. for 
an MPS treatment of pbc systems. Within this approach 
TOpbc ~ "T-obc, a tremendous improvement. However, 
that algorithm has a computational cost of m^, making 
the net improvement modest. 

Here we introduce an improvement to this pbc MPS 
algorithm based on the approximation of long products 
of certain large (m^ x m^) transfer matrices in terms of 
a singular value decomposition (SVD) with only a few 
singular values. A new circular update procedure allows 



us to work exclusively with such long products. Our ap- 
proach improves the scaling of the algorithm dramatically 
to m^. 

MPS with pbc. We summarize the algorithm presented 
in Ref. |^ and explain some practical aspects. The ground 
state of a quantum mechanical system like a spin model, 
defined on a one dimensional lattice of N sites, can be 
written in terms of an MPS^ 
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where aI^} are sets of d matrices of dimension m x m and 
d is the dimension of the Hilbert space of a single spin 
Si. The trace in eq. ([T|) ensures periodic boundary condi- 
tions. Any state can be written in this form if m is large 
enough; the power of the approach comes from the prop- 
erty that modest m produces excellent approximations 
to ground states of local Hamiltonians. Of course the ex- 
pression above is purely formal and we need a procedure 
to optimize the matrices A^s} ■ For any operator Oi on a 
site i we define the x matrix — 



(2) 



Using these generalized transfer matrices, expectation 
values of products of operators can be easily evaluated 
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The Hamiltonian can also be written using the relation 

above and the matrices a\^} can be optimized one by 
one in order to minimize the energy. Consider the Ising 

model H = erf (8) (^i+i- To optimize matrices A\^} at 
site i, an effective Hamiltonian containing only matrices 
^[11 . . . Al'-il, . . . Al^l can be constructed as fol- 

lows 



H, 



1^ ® /l* + cr^ 
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where I'' is the identity matrix in spin space and 

A; 

^ ^[jj-i]^[j+2]^[i+3] (5) 

i_l _ [j+1] [i+2] p[i-2] 

In the equation above, ah indices are taken modulo N . 
The tilde in eq. (jH refers to the exchange of indices 
^{ij){ki) = X{ik){ji)- Together with a map of the identity 
matrix 7Ve// = l"®iV% N' = E^l'^^^ . . E^^^ E^l^ . . . E^l~^\ 
a new set of d matrices A^j for fixed i is found by solving 
the generalized eigenvalue problem 

ife//Vcc(A) - e7Ve//Vec(A), (6) 

with e the expectation value of the energy and Vec(A) 
the dm^ elements of Als], aligned to a vector. 

When a new set of matrices has been found, the ma- 
trices need to be rcgauged, in order to keep the algo- 
rithm stable. In DMRG this is not necessary since the 
basis of each block is orthogonal. The orthogonality- 
constraint reads ^^^^^^(^K)'^ = 1- It 

can be satisfied 

as follows: The state is left unchanged when we substi- 
tute 4" A^^X = and 4'+" ^ X-^A^l+^\ with 
some nonsingular matrix X. This matrix X has to be 
found such that obeys the normalization condition 
C/i''(f/i'V = 1- We obtain X by calculating the in- 
verse of the square root of Q = ^[''(As^)'''. Since Q 
is not guaranteed to be nonsingular, the pseudo-inverse 
has to be usedi^, by discarding singular values close to 
zero in an SVD of Q. -ffe// '-^^ calculated iteratively^. 
while updating the A-matriccs one site at a time. One 
sweeps back and forth in a DMRG like manner. 

Vidal introduced a different approach, for infinitely 
long translationally invariant systemsii. By assuming 
only two different kinds of matrices A^^^ and A^^l and 
aligning them in alternating order, an algorithm for both 
ground state and time evolution can be constructed that 
updates the matrices in only O(m^) steps. However, un- 
like the periodic MPS method discussed here, Vidal's 
method does not apply to non translationally invariant 
systems (e.g. when impurities or a site dependent mag- 
netic field are studied). In addition, the periodic MPS 
method can be adapted^ to treat excited states, whereas 
the method of Ref. [ll| probably cannot, since the excita- 
tions would be spread over an infinite lattice and would 
have no effect on any individual site. Recently, a re- 
lated approach came to our attentioni^, in which the E- 
matrix of a translationally invariant system is treated 
in O(to^). Also recently, related Quantum Monte Carlo 
variational methods using tensor product states were 
introducedi^i^, with scaling 0{Nm^) per Monte Carlo 
sweep. 

Computational Efficiency. It was shown in Ref. that 
the m needed for pbc systems in the MPS approach 
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FIG. 1: SVD of a product E^^^ . . . E^^ with m = 10. The 
logarithm of the singular values is shown for different I 
in the case of a spin 1 Heisenberg chain of length 100 with 
periodic boundary conditions. The inset shows data for a spin 
i Heisenberg chain. 

is comparable to the m needed in obc systems within 
DMRG. However, it is also vital how CPU-time scales 
with TO. In efficient DMRG programs, most operations 
can be done by computing multiplications of m x m ma- 
trices (see Ref. H, Ch. II. i). 

In contrast, in the MPS-algorithm described above, op- 
erations on TO^ X TO^ matrices need to be done to form 
the products of i?-matrices that represent the Hamilto- 
nian. So one would expect the algorithm to be of order 
O(to^). By taking advantage of the special form of the 
E matrices eq. ([2]), multiplications can be done in O(m^) 
which is, however, still 0{m^) slower than DMRG. 

Decomposition of products. We now introduce an ap- 
proximation in the space of to^ x to^ matrices which re- 
duces the CPU time dramatically while the accuracy of 
the calculation does not suffer. Let us perform a singular 
value decomposition of a long product of i5-matrices 

k=l 

It turns out that the singular values cr^ decay very fast. 
This is shown in Fig. [T] for products of the form Y[i=i 
with various values of I, for the case of the spin 1 Heisen- 
berg chain. One can see that the longer the product the 
faster the singular values decay, roughly exponentially in 
the length I. We therefore propose to approximate long 
products in a reduced basis 

; p 
i=i fe=i 

with p chosen suitably large. In the example of Fig. [TJ we 
would choose p to be 4 at Z = 50. Remarkably, for longer 
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products p can be as small as 2 without a detectable loss 
of accuracy. Thus, the large distance behaviour of the 
ground state of the spin 1 chain is encoded in these two 
numbers, similar to the transfer matrices of a classical 
spin chain. The situation does not change significantly 
when more complicated operators such as the Hamilto- 
nian are decomposed. Of course, the decay of the singular 
values will be model dependent. For a spin i Heiscnberg 
chain we found that the decomposition can be done in 
the same manner with approximately the same number 
of singular values to be kept. 

A multiplication of a product with a new E matrix can 
therefore be doneii in 0{pm^) and a multiplication of 
two terms like ^ can be done in 0{pp'm^). By building 
the effective Hamiltonian out of products in this repre- 
sentation, the iterative evaluation of the eigenvalue prob- 
lem can be accelerated. Whereas in a dense form each 
matrix- vector multiplication - which occurs in eigenvalue 
routines such as Lanczos or Davidson - takes [dm?)'^ op- 
erations, it can now be done in 0{(Ppm?). Note that all 
operations are now done on matrices of size m x m. 

Performing the SVD in . A crucial step is the ef- 
ficient generation of the SVD representation of a large 
TO^ X matrix M in only 0{m^) operations. We de- 
scribe a simple algorithm, with a fixed number of singular 
values (four) to keep the notation simple. Suppose that 
M = UdV, with c? a 4 X 4 diagonal matrix, and that 
multiplication of M by a vector (without using the SVD 
factorization) can be done in 0{m^). To construct U, 
d, and V with 0{nv') operations, wc first form a random 
4 X matrix x, and construct y = xM . The 4 rows of y, 
are linear combinations of the rows of V . Orthonormalize 
them to form y' . Its rows act as a complete ortlionormal 
basis for the rows of V. This means that V = Vy'^y', 
and thus M = My'^y'. Construct z = My''^, and per- 
form an SVD on z: z = UdV. Then M = zy' = UdV, 
where V ~ V'y' . V is row orthogonal because V is or- 
thogonal and y' is row orthogonal. The calculation time 
for the orthogonalization of y and the SVD of z is 0{m?), 
and so the calculation time is dominated by the two mul- 
tiphcations by Af, e.g. roughly 2 x 4 x 0{w?)^ 

In applying this approach to the periodic MPS algo- 
rithm, M is a product of 0{N) i?-matriccs like in cq. [Sj 
which in turn are outer products The multiplication 
with M can be done iteratively in 0{Nm^) operations, 
analoguously to the construction of Hl^^. The calcula- 
tion time is thus 0{Nm?) for each SVD representation 
generated this way. It is only needed a few times per 
sweep (see below). 

A circular algorithm. A speed-up in the simulation 
can only be expected if the number of singular values 
that need to be included is sufficiently small. 

However, in the algorithm of Ref. 1^ one sweeps back 
and forth through the lattice, so that close to the turning 
points, products of only a few E-matrices appear, which 
require more singular values (Fig. [I}. In the extreme 
case of only one i?-matrix, we would have p = rn^. To 
overcome this bottleneck we propose a modified method 
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FIG. 2: Scaling of the circular version of the algorithm. CPU- 
time per sweep (one update of each site) is measured on a 100 
site spin ^ Heisenberg chain for different system sizes. The 
time is fitted to a function m'^ . 



which proceeds through the chain in a circular fashion, 
thus making natural use of the periodic boundary condi- 
tions. Note that we cannot employ multiplications with 
inverse matrices Eq^, since they are too expensive to 
calculate. Wc consider the lattice as a circular ring, and 
divide it into thirds, or "sections". We perform update 
steps for one section at a time. To start one section, 
we first construct the Hamiltonian and other necessary 
operators (see eq. O corresponding to the other two sec- 
tions of the lattice. Only a few such operators are needed. 
Each of them contains products of i?-matrices and is 
computed by an SVD decomposition as described before. 

Then a set of these operators is made by successively 
adding sites from the right most part of the current sec- 
tion to the operators constructed for the section on the 
right, working one's way to the left. Adding a site in- 
volves the multiplication of an E-matrix to the left of 
an operator. These steps can each be done in 0{m^) 
operations. When one has reached the left side of the 
current section, its initialization is finished and one can 
start the normal update steps, now building up a set of 
operators from the left, again in 0{m^) operations. One 
stops when one reaches the right hand side of the current 
section. Then the procedure repeats with the section to 
the right as the new current section. Some of the oper- 
ators previously computed can be reused. The updates 
now go in a circular pattern rather than the usual back 
and forth. 

By proceeding in this way on a system of length N, the 
blocks on which we have to perform an SVD arc of length 
at least N/3 (if we split our system into three parts) , so 
that the SVD will have only few singlular values. Conse- 
quently, the algorithm is expected to scale like 0{Nm^). 

Test and Results. To test our improvements, we stud- 
ied spin 1 and spin i Heiscnberg chains up to length 
N = 100. 

The exact ground state energy for pbc on an 
= 100 chain in the spin 1 case is found to be 
Eq/N = -1.4014840386(5) via a DMRG calculation with 
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FIG. 3: Relative error of the ground state energy of the spin 
1 Heisenberg model versus the dimension m of the reduced 
Hilbert space. DMRG results with obc and pbc are shown, 
as well as Matrix Product State results with pbc. The inset 
shows MPS results with pbc for a spin i Heisenberg chain of 
100 sites. 



m = 2000. The error is generously estimated from the 
truncation error and an extrapolation in m. The periodic 
result differs from the infinite system result (determined 
using long open chains) only in the last decimal place, so 
we will call this value "exact" . 

We discarded singular values smaller than a lO^^^th 
of the largest one. This parameter is chosen such that 
the algorithm remains stable, which is not the case if the 
error bound is chosen too large (10~^ or larger). To de- 
crease the time it takes until convergence is reached, we 
start our calculation with small m and enlarge it after 
the algorithm converged for the current m. This is also 
done in many DMRG programs. We enlarge the matri- 
ces A and increase their rank by filling the new rows and 
columns with small random numbers r, uniformly dis- 
tributed in the interval [— 10~^, 10~^]. The number of 
sweeps it takes until convergence is reached is similar to 
DMRG. For the present model, two or three sweeps are 
enough for each value of m. 

Fig. [2] shows that the algorithm indeed scales like m^, 
and no longer like m^. It is slightly faster on small sys- 



tems, due to faster parts of the algorithm, and becomes 
slightly slower on large systems, likely due to memory ac- 
cess times. Our method (on a periodic system) requires 
a constant factor of about 10 as many operations per it- 
eration as DMRG does on an open system, which is still 
very efficient. 

Finally, we studied the convergence to the exact 
ground state energy as a function of m. We investi- 
gated DMRG with obc and pbc, and the MPS algorithm 
with pbc, both the original version and our improved 
method. The relative error for these cases is plot- 
ted in Fig. [3l The relative error of the spin correlation 
function (not shown) is of similar magnitude with our 
improved method. 

As has been well known, DMRG with obc performs 
much better than with pbc. With the MPS algorithm 
and pbc the relative error as a function of m is compa- 
rable to the error made with DMRG and obc. This has 
already been reported earlier—. The important point here 
is that the error remains the same when we introduce the 
approximations. Also, the number of sweeps until con- 
vergence is reached is similar for DMRG with obc and 
for MPS. We note that the convergence in Fig. [3] is con- 
sistent with exponential behavior in the spin 1 case and 
with a power law for spin i. 

In a typical DMRG calculation, matrix dimensions 
771 ~ 100 — 1000 (and larger) are used. To illustrate the 
computational time scaling, suppose we study a model 
which requires m = 300 states for obc with traditional 
DMRG. Then our new approach gains a factor of roughly 
m^/m^ ~ 10^ over the method of Ref. and even more 
over traditional DMRG. 

In summary, by introducing a well controlled approxi- 
mate representation of products of MPS transfer matrices 
in terms of a singular value decomposition, we have for- 
mulated a circular MPS method for systems with periodic 
boundary conditions, which works with a computational 
effort comparable to that of DMRG with open boundary 
conditions. 



Acknowledgments 

We acknowledge support from the NSF under grant 
DMR-0605444 (SRW) and from NAWI Graz (PP). 



1 S. R. White, Phys. Rev. Lett. 69, 2863 (1992). 
^ S. R. White, Phys. Rev. B 48, 10345 (1993). 
^ U. SchoUwock, Rev. Mod. Phys. 77, 259 (2005). 
* S. Ostlund and S. Rommer, Phys. Rev. Lett. 75, 3537 
(1995). 

^ S. Rommer and S. Ostlund, Phys. Rev. B 55, 2164 (1997). 
® D. Porras, F. Verstraete, and J. I. Cirac, Phys. Rev. B 73, 

014410 (2006). 
^ G. Vidal, Phys. Rev. Lett. 91, 147902 (2003). 



F. Verstraete, D. Porras, and J. I. Cirac, Phys. Rev. Lett. 

93, 227205 (2004). 
® B. Paredes, F. Verstraete, and J. I. Cirac, Phys. Rev. Lett. 

95, 140501 (2005). 
^" F. Verstraete and J. 1. Cirac, 'cond-mat/0407066" (2004). 
" G. Vidal, Phys. Rev. Lett. 98, 070201 (2007). 

N. Schuch, M. M. Wolf, F. Verstraete, and J. I. Cirac, 

Phys. Rev. Lett. 100, 040501 (2008). 

A. W. Sandvik and G. Vidal, Phys. Rev. Lett. 99, 220602 



5 



(2007). 

D. Perez-Garcia, F. Verstraete, M. Wolf, and J. Cirac, 
Quantum Inf. Comput. 7, 401 (2007). 
F. Verstraete, private communication. 
F. Verstraete et al, in preparation. 

Denote v as an vector and V as the elements of v 



aligned as an m x m matrix. Then one can use the rela- 
tion (A (g) B)v — Vec{BV A^) to perform a matrix-vector 
product. 

A Lanczos approach would take an additional factor for 
convergence. 



